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Abstract 

The Milankovitch theory states that the orbital eccentricity, precession, and obliquity of the Earth influence our climate by modu¬ 
lating the summer insolation at high latimdes in the northern hemisphere. Despite considerable success of this theory in explaining 
climate change over the Pleistocene epoch (2.6 to 0.01 Myr ago), it is inconclusive with regard to which combination of orbital 
elements paced the lOOkyr glacial-interglacial cycles over the late Pleistocene. Here we explore the role of the orbital elements 
in pacing the Pleistocene deglaciations by modeling ice-volume variations in a Bayesian approach. When comparing models, 
this approach takes into account the uncertainties in the data as well as the different degrees of model complexity. We And that 
the Earth’s obliquity (axial tilt) plays a dominant role in pacing the glacial cycles over the whole Pleistocene, while precession 
only becomes important in pacing major deglaciations after the transition of the dominant period from 41 kyr to 100 kyr (the 
mid-Pleistocene transition). We also And that geomagnetic held and orbital inclination variations are unlikely to have paced the 
Pleistocene deglaciations. We estimate that the mid-Pleistocene transition took place over a 220 kyr interval centered on a time 
715 kyr ago, although the data permit a range of 600-1000 kyr. This transition, occurring within just two 100 kyr cycles, indicates 
a relatively rapid change in the climate response to insolation. 

Keywords: glacial cycles; obliquity; Pleistocene; Bayesian inference; mid-Pleistocene transition; climate model 


1. Introduction 


During the past 1 Myr (the late Pleistocene), the polar 
ice sheets grew slowly (glaciation) then retreated abruptly 
(deglaciation or glacial termination) repeatedly, with an inter¬ 
val of about 100kyr ( [Hays et ah 1976| ). These quasi-periodic 
glacial-interglacial cycles dominated terrestrial climate change. 
They are recorded by paleoclimatic proxies such as (the 
scaled isotope ratio) in foraminiferal calcite, which is 

sensitive to changes in global ice volume and ocean temper- 
amre. Eollowing on from the work of Adhemar, Croll, and 
others, Milankovitch proposed that climate change is driven by 
the insolation (the received solar radiation) during the northern 
hemisphere summer at northerly latitudes ( |Milankovic| 1 194 1) . 
This insolation depends on the Earth’s orbit and axial tilt (obliq¬ 
uity), and Milankovitch suggested that through various climate 
response mechanisms, variations in these orbital elements - in 
particular eccentricity, obliquity, and precessiorQ- can cause 
climate change (“Milankovitch forcing”). Many studies have 
broadly confirmed Milankovitch’s theory and the role of Mi¬ 
lankovitch forcing in driving Pleistocene climate change, for 
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example by spectral analyses of paleoclimatic time series de¬ 
rived from deep-sea sediments (Hays et ah]|1976{ |Shackleton| 


and Opdyke 1973[ Kominz et al. 1979| l. These studies have 
demonstrated that the climate variance is concentrated in peri¬ 
ods of about 19 kyr, 23 kyr, 42 kyr and 100 kyr which are close 
to the dominant periods in precession (~23 and 19 kyr), obliq¬ 
uity (~41 kyr), and eccentricity (~100 and 400 kyr). 

There are, however, several difficulties in reconciling the Mi¬ 
lankovitch theory with observation. Two in particular arise 
when trying to explain the 100 kyr cycles. The first is the tran¬ 
sition from the 41 kyr dominant period in climate variations to 
a 100 kyr dominant period at the mid-Pleistocene around 1 Myr 
ago (hereafter “Myr ago” is written “Ma”). The second dif¬ 
ficulty is generating 100 kyr sawtooth variations from orbital 
forcings and climate response mechanisms ( |Imbrie et al.|1993 ] 
Huybers|2007 Lisiecki|2010 l. On the one hand, and as shown 
in Eigure [1] the onset of 100 kyr power at the mid-Pleistocene 
transition (MPT) occurs without a corresponding change in the 
summer insolation at high northern latitudes (represented by the 
daily-averaged insolation on 21 June at 65°N). On the other 
hand, the ~ 100 kyr eccentricity cycle produces only negligible 
100 kyr power in seasonal or mean annual insolation variations, 
despite its modulation of the precession amplitude. Eurther- 
more, the variations of eccentricity and the northern summer 
insolation are weak while the 100 kyr climatic variations are 
strong, notably in marine isotope stage (MIS) 11 (see Eigure 
and Imbrie and Imbrie (T980|; Howard ( 1997| l). These prob¬ 
lems are referred to as the “100kyr problem” (Imbrie et al.[ 
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Figure 1: Climate variations over the P leistocene. The present day is at time zero on the right. The record (lower solid line) 
stacked by Lisiecki and Raymo (2005 i is compared with the daily-averaged insolation at the summer solstice at 65°N, Q‘^^^65°N 
(upper solid line), the obliquity (dashed line), and the eccentricity (dotted line) calculated by Laskar et al. ( |2004| l. The latter two 
have been scaled to have a common amplitude. The grey region around -lOOOkyr represents the MPT extending from -1250kyr 
to -770kyr (Clark et al. 2006| l. The grey bar extending from -423 to -362kyr represents marine isotope stage (MIS) 11. The 
variations are dominated by 41 kyr and 100 kyr cycles before and after the MPT respectively. 


[T99^ . 

Various models with different climate forcings and response 
mechanisms have been proposed to solve the 100 kyr problem. 
Many are based on either deterministic climate forcing models 
or stochastic internal climate variations. The former proposes 
that the 100 kyr cycles are driven by orbital variations, particu¬ 
larly precession and eccentricity (Imbrie and Imbrie[ |1980 [ [PaiT] 
|lard| |1998| |Gildor and Tziperman 2000| l. Many models treat 
the insolation variation as a pacemaker which sets the phase of 
the glacial-interglacial oscillation by directly controlling sum¬ 
mer melting of ice sheets ( |Gildor and Tziperman] |2000| ). In this 
latter hypothesis, stochastic internal climate variability plays 
the main role in generating the 100 kyr glacial cycles ( |Saltz -| 
1982 Pelletier 2003 Wunsch 2003| l. A general approach 


is to combine the deterministic and stochastic elements within a 
framework of nonlinear dynamics, which allows for the occur¬ 
rence of bifurcation and synchronisation in the climate system 
(see review by |Crucifix|2012| l. 

Other proposed hypotheses include glaciation cycles con¬ 
trolled by the accretion of interplanetary dust when the Earth 
crosses the invariable plane ( |Muller and MacDonald) |1997[ ) or 
by the cosmic ray flux modulated by the Earth’s magnetic field 
(measured as the geomagnetic paleointensity, GPI;|Christl et al.| 
200^|Courtillot et al.|2007)l. Some models also try to explain 


the MPT with (Raymo et al. 1997 Paillard |1998 Honisch 


et al. 2009 Clark et al.| |2006[ ) or without ( |Huybers 2009 [ 


Lisiecki 2010t Imbrie et al. 201 1[) an internal change in the 


climate system. 

The above models comprise both climate forcings and re¬ 


spouses. According to various studies (Saltzman 

1987 

Maasch 

and Saltzman 1990 Ghil 

1994 Raymoetal.il 

19^ 

Paillardl 

1998||Clarketal.||1999 

'ziperman and Gildor 

2003 

Ashke-| 

nazy and Tziperman 200 

4|, climate forcings frequently deter- 


mine the time of occurrence of some climate feature, such as 
the onset of deglaciation. Many recent studies have employed 
concepts from chaos theory to address the problem of climate 


change ([Crucifix 

2012[ Parrenin and Paillard| 2012[ Crucifix 

120131 [Mitsui anc 

Aihara 20141 Ashwin and Ditlevsenl 2015 

Williamson and Lenton 20151, which then allow the concept 


of "pacing" to be described more rigorously as a forcing mech¬ 
anism. Huybers ( 201 l| l noted that many tens of pacing models 
have been proposed, yet we lack the means to choose between 
them. 

Our current work aims to compare different forcing mecha¬ 
nisms by using a simple ice volume model for the Pleistocene 
glacial-interglacial cycles. We adopt the pacing model given 
by [Huybers and Wunsch| ( |2005| l and combine it with different 
forcings in order to predict the glacial terminations, which are 
identified from several records. Our models do not de¬ 
scribe the physical mechanism of the climate response to exter¬ 
nal forcings. We aim instead only to measure the role of differ¬ 
ent forcings in determining the times of deglaciations. Due to 
the large and rapid change in ice volume at deglaciation, these 
times are relatively easy to identify, so the time uncertainties 
associated with identification are small. They are nonetheless 
still affected by the overall uncertainty in the chronology of the 
b'*0 record (Huybers and Wunsch 2005 i. 


A common approach for assessing a model is to use p-values 
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to reject a null hypotheses (|Hu ybers and Wunsch| |200 5t |Huy-| 
bersl 12011||. However, it is well established that p-values can 


these three data sets, we also analyze the orbital-tuned ben- 


give very misleading results ( 

Berger and Sellke 

1987] Jaynes set), despite its orbital assumptions. The LR04 record was re- 

2003 Christensen 2005 Bai 

er-Jones 

2009 Feng and Bailer- calibrated by H07 to generate a tuning-independent LR04 data 

Jones] 2013|l, so we instead compare models using the Bayesian set (“LRH” data set; see the supplementary material of H07 for 


evidence. This compares models on an equal footing and takes 
into account the different flexibility (or complexity) of the mod¬ 
els (Kass and Raftery |1995[ Spiegelhalter et al. [2002 [[\^ Tou- 


Issaintl 20111. 


This paper is organized as follows. In section|^we assemble 
the data - stacked records - and identify the glacial ter¬ 
minations. In section]^ we summarize the Bayesian inference 
method as we use it. We build models based primarily on or¬ 
bital elements to predict the Pleistocene glacial terminations in 
section 12 These are compared for different data sets and time 
scales in section|2 We perform a test of sensitivity of the results 
to the model parameters and choice of time scales in section |2 
Finally, we discuss our results and conclude in sectionj^ 

2. Data 

2.7. from a depth-derived age model 

The past climate can be reconstructed from isotopes recorded 
in ice cores or deep sea sediment cores. Air bubbles trapped at 
different depths in ice cores can be used to reconstruct the past 
atmospheric temperature, for example. Ice cores have so far 
been used to trace the climate back to about 800 kyr ( |Augustin| 
|et al.| 2004 [ ). In order to reconstruct the climate back to 2 Ma, 
the b'^O ratio recorded in the calcite (CaCOs) in foraminifera 
fossils (including species of benthos and plankton) in ocean 
sediment cores can be used. We use the b'^O ratio as a mea¬ 
sure of variations in the global ice volume, although we note 
that this is also sensitive to the temperature and isotope com¬ 
position of seawater, for which corrections can be made. For a 
discussion of the interpretation of marine calcite b^^O see for 
example Shackleton (1967 1 and Mix and Ruddiman ( 1984| l. 

In order to calibrate b**^0 measurements and to assign ages 
to sediment cores, one could assume either a constant sedi¬ 
mentation rate (determined using radiometric ally dated geo¬ 
magnetic reversals), or a constant phase relationship between 
b'^O and an insolation forcing based on the Milankovitch the¬ 
ory (see [Huybers and Wunsch||2004 for details). The former 
is the “depth-derived age model” ( Huybers and Wunsch| |2004[ 
Huybers 20071. The latter is referred to as “orbital tuning’ jE 


brie et al. 1984[|Martinson et al.||1987]|Shackleton et al.[|1990| l. 

Clearly this latter method is not appropriate for testing theories 
related to Milankovitch forcings, because it already assumes a 
link between b^^O variations and orbital forcings. 

[Huybers I ( |2007| l (hereafter H07) stacked and averaged twelve 
benthic and five planktic b'^O records to generate three b^^O 
global records; an average of all b^^O records (“HA” data set); 
an average of the benthic records (“HB” data set); an aver¬ 
age of the planktic records (“HP” data set)0 In addition to 


details). 

We standardize each of the above b'*0 records over the past 
2 Myr to have zero mean and unit variance, to produce what we 
call the b'^O anomalies as shown in Figure 0 (DD, ML, MS 
are explained below). We identify the deglaciations in the next 
section. We see that the sawtooth 100 kyr glacial-interglacial 
cycles become significant over the late Pleistocene while 41 kyr 
cycles dominate over the early Pleistocene. From now on, we 
will use the term “late Pleistocene” to mean the time span 1 Ma 
to OMa, and “early Pleistocene” to mean 2 Ma to 1 Ma. 

2.2. Identification of deglaciations 

Rather than trying to model the full time series of b^^O varia¬ 
tions, we focus instead only on the times of glacial terminations 
(deglaciations). This is because an orbital forcing should de¬ 
termine predominantly the timing of a deglaciation rather than 
the detailed variation of the ice volume ( Gildor and Tzipermanj 
|2000[ jPaillardl |1998[ [Huybers and Wunschj 2005| l. This not 
only simplifies the problem (thus making results more robust), 
but is also in line with our goal of trying to identify the main 
pacemakers for deglaciations, rather than trying to model the 
continuous response of the climate to orbital forcings. Here we 
describe how we identify the deglaciations. 

From Figure [2 we see that the b'^O amplitudes are larger in 
the late Pleistocene than in the early Pleistocene. This is inter¬ 
preted to mean that after the MPT, ice sheets both grew to larger 
volumes and retreated more rapidly to ice-free conditions. This 
rapid and abrupt shift from extreme glacial to extreme inter¬ 
glacial conditions defines 11 well-established late-Pleistocene 
terminations ( [Broecker|[1984||Raymo[[1997| l. Because termina¬ 
tion 3 is sometimes split into two terminations ( [Huybers] [201 l| l 
- labeled 3a and 3b (Figure j^l - we actually identify 12 ma¬ 
jor terminations over the late Pleistocene. The times of these 
major terminations as established by various publications has 
been collated by [Huybers] ( [2011[ ) and are given in his supple¬ 
mentary material. Based on his Table S2, we define three sets 
of terminations which cover just the late Pleistocene: 

• DD: termination times and corresponding uncertainties es¬ 
timated from the depth-derived timescale in H07; 

• MS: termination times and corresponding uncertainty 
equal to the median and standard deviation (respectively) 
of different termination times for each event given in the 
literature Pmbrie et al.[ [1984 [Shackleton et al.j 1990] 


Lisiecki and Raymo 2005] Jouzel et al. 2007 Kawamura 
et al.[[20()7 1; 


^The planktic <5**0 records may not produce a stack as good as benthic 
records because surface water is less uniform in temperature and salinity than 


the deep ocean jLisiecki and Raymo[[2005|. 


ML: termination times as in the MS data set, but with 
larger uncertainties obtained by adding the time uncertain¬ 
ties of the depth-derived time scales in quadrature with the 
corresponding uncertainties in the MS data set. 
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Figure 2: The variation of with time as determined by a depth-derived age-model (HA, HB, HP, and LRH) and an orbital¬ 
tuning model (LR04). The past 2000kyr is divided into two parts: the early Pleistocene extending from 2Ma to 1 Ma and the 
late Pleistocene extending from 1 Ma to the present. The deglaciations we identify for each data set are show in red: the point is 
the mean time, the error bar is the uncertainty. In the late Pleistocene, we identify three additional sets of terminations: the DD 
terminations are denoted by blue lines while the ML/MS terminations are denoted by green lines. These each consists of 12 major 
terminations, which are indicated by the numbers (we use the convention of splitting termination 3 into two events). What we call 
minor terminations are all the red points which are not major terminations. 


These terminations are shown as vertical lines in Figure]^ 

In addition to these major terminations, there are also minor 
terminations characterized by transitions from moderate glacial 
to moderate interglacial conditions. Considering the ambiguity 
in defining these ( [Huybers and Wunsch||2005||Lisiecki[|2010| l, 
we identify terminations in our records using the method 
of H07. A termination is identified when a local maximum and 
the following minimum (defined as a maximum-minimum pair) 
have a difference in larger than one standard deviation of 
the whole record. The time of a termination is the mid¬ 
point of the maximum-minimum pair and the age uncertainty of 
this mid-point is calculated from a stochastic sediment accumu¬ 
lation rate model (Huybers 2007| l. We identify sustained events 
in all data sets by filtering with different moving-average 
(or "Hamming") filters. The data sets are show in Figure]^ We 
use the term “major terminations” to refer to terminations iden¬ 
tified in these data sets which coincide with the major termina¬ 
tions in the DD, MS, or ML data sets. All other terminations 
we refer to as minor terminations. The data on these are listed 
in Table [T] 


Finally, we also define three additional hybrid data sets. As 
the HA data set is a stack of both benthic and planktic records, 
we combine the early-Pleistocene terminations identified from 
the HA data set together with late-Pleistocene terminations 
from the DD, ML, and MS data sets to generate the HADD, 
HAML, and HAMS data sets, respectively. 


Thus starting from our five original data sets (HA, HB, HP, 
LR04, LRH), we have a total of 11 data sets of glacial termi¬ 
nations against which we will compare our models (see Table 

0 - 

As there are dating errors and identification uncertainties, we 
cannot know exactly when a deglaciation occurred. To take into 
account these uncertainties, we treat the time of each deglacia¬ 
tion probabilistically by defining a Gaussian distribution with 
the mean and standard deviation equal to the time and time un¬ 
certainty (respectively) of the termination. The terminations in 
a data set are therefore represented as a sequence of Gaussians, 
which will be modeled as described in the following section. 


3. Bayesian modelling approach 


We use the standard Bayesian probabilistic framework (e.g. 


Kass and Raftery |1995[ |Jeffreys[ |1961[ |MacKay| |2003[ [S 


ivia 


and Skilling 2006| l to compare how well the different models 
explain the paleontological data. This approach takes into ac¬ 
count the measurement errors, accounts consistently for the dif¬ 
fering degrees of complexity present in our models, and com¬ 
pares models symmetrically. Our specific methodology is out¬ 
lined briefly in this section. It is described in more detail in 
|Bailer-Jones| p011a|b[ ), where we also present arguments why 
this approach should be preferred to hypothesis testing using 
p-values. 
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Table 1; Terminations (major and minor) identified from different records using H07’s method (HA, HB, HP, LRCH and LRH) 
and the DD, MS and ML data sets of major terminations. Combining the early Pleistocene terminations of HA with the DD, MS 
and ML data sets, we obtain the hybrid data sets of HADD, HAMS and HAML. For each column, the termination ages are listed 
on the left side and the age uncertainties are listed on the right side (also see Figure |^. All quantities are in units of kyr. 



HA 

HB 

HP 

LR04 

LRH 

DD 

MS 

ML 


-10 

0.81 

-10 

0.81 

-11 

1.9 

-12 

2.2 

-12 

2.2 

-11 

1.9 

-13 

1.8 

-13 

3.1 


-127 

5.3 

-127 

5.3 

-127 

5.3 

-131 

6.3 

-125 

5 

-124 

5 

-128 

3.6 

-128 

6.6 


-209 

6.6 

-209 

6.6 

-209 

6.6 

-219 

7.5 

-208 

6.4 

-208 

6.4 

-218 

4.3 

-218 

8.7 


-233 

6.4 

-233 

6.4 

-233 

6.4 

-245 

7 

-233 

6.4 

-231 

6.3 

-244 

4.8 

-244 

8.6 


-323 

6.8 

-321 

7 

-323 

6.8 

-290 

7.5 

-321 

7 

-326 

7 

-337 

4.5 

-337 

9.8 


-415 

7.4 

-415 

7.4 

-415 

7.4 

-335 

8.4 

-413 

7.6 

-423 

7.1 

-421 

4.4 

-421 

8.2 

Late 

-537 

6.5 

-535 

6.6 

-537 

6.5 

-531 

7.3 

-581 

6.9 

-622 

5.8 

-621 

2.7 

-621 

6.4 

Pleistocene 

-581 

6.9 

-581 

6.9 

-537 

6.5 

-531 

7.3 

-581 

6.9 

-622 

5.8 

-621 

2.7 

-621 

6.4 

(between 

-621 

5.8 

-621 

5.8 

-601 

6.4 

-581 

6.9 

-621 

5.8 

-714 

4.5 

-712 

7.5 

-712 

00 

bo 

1 

-705 

5.9 

-705 

5.9 

-622 

5.8 

-621 

5.8 

-705 

5.9 

-794 

3.7 

-793 

1.8 

-793 

1.8 

and 0 Ma) 

-743 

5 

-742 

4.8 

-705 

5.9 

-708 

5.4 

-741 

4.5 

-864 

5.7 

-864 

0.84 

-864 

5.8 


-789 

4.2 

-789 

4.2 

-745 

5.5 

-743 

5 

-788 

4.2 

-957 

5.8 

-958 

1.7 

-958 

6.0 


-866 

5.8 

-866 

5.8 

-787 

4.1 

-791 

4.1 

-865 

5.7 








-911 

6 

-911 

6 

-845 

8 

-867 

5.7 

-912 

6 








-955 

5.9 

-955 

5.9 

-865 

5.7 

-915 

5.9 

-955 

5.9 








-996 

5.5 

-996 

5.5 

-955 

5.9 

-959 

5.7 

-978 

7 














-983 

6.5 










-1029 

5.6 

-1029 

5.6 

-1030 

5.6 

-1031 

5.5 

-1027 

5.5 








-1080 

6.6 

-1080 

6.6 

-1075 

6.1 

-1085 

6.5 

-1079 

6.5 








-1111 

8.1 

-1111 

8.1 

-1109 

8 

-1117 

8 

-1109 

8 








-1170 

10.4 

-1171 

10.5 

-1149 

9.9 

-1192 

11.4 

-1172 

10.5 








-1235 

11.7 

-1234 

11.7 

-1173 

10.5 

-1244 

12 

-1234 

11.7 








-1279 

12.3 

-1279 

12.3 

-1235 

11.7 

-1285 

12.3 

-1278 

12.3 








-1316 

12.9 

-1316 

12.9 

-1279 

12.3 

-1325 

12.7 

-1317 

13 








-1358 

13.2 

-1358 

13.2 

-1324 

12.7 

-1363 

13.1 

-1359 

13.2 







Early 

-1403 

13.3 

-1403 

13.3 

-1353 

13 

-1405 

13.2 

-1405 

13.2 







Pleistocene 

-1445 

13.4 

-1445 

13.4 

-1407 

13.2 

-1447 

13.3 

-1445 

13.4 







(between 

-1485 

13.2 

-1485 

13.2 

-1449 

13.2 

-1493 

12.9 

-1485 

13.2 







2 

-1521 

12.9 

-1521 

12.9 

-1481 

13.1 

-1529 

12.5 

-1521 

12.9 







and 1 Ma) 

-1560 

12.9 

-1559 

12.4 

-1521 

12.9 

-1569 

12 

-1561 

12.3 








-1641 

10.8 

-1642 

10.8 

-1562 

12.3 

-1609 

11.5 

-1608 

11.5 








-1688 

9.8 

-1689 

9.8 

-1607 

11.5 

-1644 

10.7 

-1641 

10.8 








-1741 

7.4 

-1741 

7.4 

-1640 

10.8 

-1694 

9.4 

-1690 

9.7 








-1783 

6.9 

-1783 

6.9 

-1742 

7.4 

-1743 

7.3 

-1741 

7.4 








-1855 

7.7 

-1855 

7.7 

-1784 

7 

-1783 

6.9 

-1855 

7.7 








-1897 

7.3 

-1897 

7.3 

-1820 

6.9 

-1859 

7.6 

-1855 

7.7 








-1940 

5.8 

-1940 

5.8 

-1856 

7.7 

-1940 

5.8 

-1941 

5.9 












-1893 

7.1 
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The posterior probability of a model M postulated to describe 
a data set D is given by the rules of probability as 


P(M\D) = 


P(D\M)P(M) 

P(D) 


( 1 ) 


where P{M) is the prior of model M, and P(D) can be consid¬ 
ered here as a normalization constant. P(D\M) is the evidence 
of model M which can be written mathematically as 


P(D\M) = 


f 


P(Dlff,M)P(fflM)d0 . 


( 2 ) 


0 is the set of parameters of model M, P(D\0, M) is the likeli¬ 
hood - the probability of observing the data D given specific 
values of the model parameters - and P{0\M) is the prior distri¬ 
bution of parameters of this model. 

Ideally we would be interested in evaluating the P(M\D) for 
different models, as this is the probability of a model being true 
given the observed data. However, this would require that we 
define all possible models. Thus in practice we compare models 
by looking at the ratio of model posterior probabilities. If we 
cannot (or choose not to) distinguish between models a priori, 
then we set P{M) to be equal for all models. It follows from 
equations[T]and|^that this ratio for models Mi and M 2 is 

P{Mi\D) _ P{D\Mi) _ f P(DI0uMi)P(0ilMi)d0i 
P(M2lD) ~ P(DIM2) ~ f P(DI02,M2)P(02lM2)d02' 


The above ratio of the evidences is called the Bayes factor and 
is used to compare how well a model (relative to another model) 
predicts the data, independent of the values of the model param¬ 
eters. Note that this does not involve tuning the model parame¬ 
ters, which is why using the evidence takes into account differ¬ 
ing model complexities. A (maximum) likelihood ratio test, in 
contrast, automatically favors more complex models (e.g. ones 
with more parameters), because such model can be tuned to fit 
the data better without them suffering any penalty on account 
of their increased complexity: an arbitrarily complex model 
will fit the data arbitrarily well. The evidence automatically 
balances model complexity against fitting accuracy to find the 
most plausible model, as described in the above references. 

If we had good reasons to adopt unequal model priors (i.e. 
other information favored one model over another), then we 
should instead look at the product of the Bayes factor with the 
ratio of these priors, but this is not done here. 

To account for the time uncertainties in the glacial termina¬ 
tions, we interpret a termination time as a Gaussian measure¬ 
ment model 

i^(f;|T^) = —(4) 
V^O-; 

where tj is the measured time of termination j (identified from 
a stacked record), ctj is the estimated uncertainty in that 
measurement and tj is the (unknown) true termination time. 

If D comprises N independently measured events, then the 
probability of observing the complete data set D - {tf is just 


the product 


P{D\0,M) = Y\Pitj\0,M) 


nX P(tj\Tj)P(Tj\0,M)dTj 


( 5 ) 


where the second line just follows from the marginalization rule 
of probability. P(f/|0, M), the event likelihood, is the probabil¬ 
ity that an event (termination) j is observed at time tj. It is 
equal to the integral of the product of the measurement model 
with the model-predicted probability of the true time of the 
event, P(Tj\0,M), over all values of the true time. That is, we 
marginalize (average) over the unknown true time. (This is ex¬ 
plained further in section [43] after we have introduced the mod¬ 
els.) 

This model-predicted probability of the times of the events, 
i.e. the deglaciations, is the time series model. This will be 
derived in section l^from the orbital forcing and pacing models. 

We then have all the ingredients we need to calculate the like¬ 
lihood (equation]^, and therefore the evidence (equation]^ for 
a given time series model for a given data set. Both the likeli¬ 
hood calculation and the evidence calculation involve an inte¬ 
gral. We perform these numerically. The former is one dimen¬ 
sional (over time), so is straightforward. The latter is multi¬ 
dimensional (over the model parameters), so we use a Monte 
Carlo method. This involves drawing parameter samples from 
the parameter prior distribution, P{0\M), calculating the likeli¬ 
hood for each, and then averaging the result. In each case we 
draw 10^ samples. 

The Bayes factor is a positive number. The larger it is com¬ 
pared to unity, the more we favor model 1 over model 2. Based 
on the criterion given by |Kass and Rafter^ ( |1995| l, we conclude 
that model 1 should be favored over model 2 if the Bayes factor 
is more than 10 (and 2 over 1 if it is less than 0.1). If the Bayes 
factor lies between 0.1 and 10, we cannot favor either model. 


4. Time series models 


In section 4.1 we introduce various climate forcing models, 
such as those based on variations of the Earth orbital parame¬ 
ters. In section we define the pacing models. We use this 
term in a somewhat narrower sense than is often used in the lit¬ 
erature (Saltzman et al. 1984[ Tziperman et al.|| 2006| l. Here a 
pacing model is one which modulates the effect of a continu¬ 
ously variable forcing mechanism through the introduction of 
a threshold. Specifically, the ice volume is unaffected by the 
forcing mechanism until the ice volume exceeds some thresh¬ 
old, where the value of this threshold depends on the magnitude 
of the forcing. Having defined the forcing and pacing models, 
we use them in section [43l to predict a sequence of glacial ter¬ 
mination times. For a given forcing/pacing model M, and val¬ 
ues of its parameters 0, this is the term P(Tj\0, M) in equation]^ 
In section we will compare these model-predicted termina¬ 
tions with the measured ones, using the the Bayesian approach 
to compare the overall ability of the models to explain the data. 
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4.1. Forcing models 

Insolation influences the climate in a number of ways, both 
directly through mechanisms such as heating the lower atmo¬ 
sphere, and indirectly through modifying the ice accumula¬ 
tion rate and other mechanisms (|Berger[ 1978bJ^ Saltzman and 
Maasch 1990| l. Mainstream thinking holds that climate change 
is most sensitive to the northern summer insolation at high lat¬ 
itudes because the temperature in continental areas, of which 
there is more in the northern hemisphere, is critical for ice melt¬ 
ing or sublimation ( |Milankovic| |1941[ ). The summer insolation 
at high latitudes depends on the geometry of the Earth’s orbit 
and the inclination of Earth’s spin axis, and thus depends on 
the eccentricity, precession, and obliquity (hereafter referred to 
collectively as “orbital elements”, even though obliquity is not 
orbital). Variations in these alter how the insolation varies with 
season (from orbital and axial precession), with latitude (from 
obliquity changes), and with time scale (e.g. eccentricity varia¬ 
tions occur at dominant periods of lOOkyr and 400 kyr). 

Milankovitch proposed that the combination of orbital ele¬ 
ments which gives rise to the measured summer insolation at 
65°N is crucial to generating the glacial-interglacial cycles (|Mi- 
lankovic 1941[ Hays et al. 1976| l. To model orbital forcings 
more generally, we define an orbital forcing model, /(f), as a 
combination of eccentricity, precession, and obliquity, which is 
proportional to the insolation over certain time scales, seasons, 
and latitudes. We build the following forcing models based on 
the reconstructed time-varying eccentricity, precession, 

/p(f), obliquity, fiit), and four different combinations thereof: 


/eCO = e{t) 

/p(f) = e(f) sin(w(f) -/) 
hit) = e(f) 

/Ep(f) = a^'^hit) + (1 - (6) 

/ET(f) = a^'^hit) + (1 - af^^hit) 

/pT(f) = + (1 - a^'^hit) 

/EPT(f) = +y8‘^Vp(f) + (1 - a -yS)i/VT(f), 


where e(f), e(f), and e{t) sin(w(f) - /) are the eccentricity, obliq¬ 
uity, and precession index (or climatic precession), respectively. 
co{t) is the angle between perihelion and the vernal equinox, and 
/ is a parameter controlling the phase of the precession. We 
use the variations of these three orbital elements over the past 
2Myr as calculated by Laskar et al. ( 2004] l. We standardize 
each of /^(f), /p(f), and /r(f) to have zero mean and unit vari¬ 
ance, and then combine them to generate the compound mod¬ 
els. a and /? are contribution factors which determine the rela¬ 
tive contribution of each component in the compound models, 
where 0 < a < 1 and 0 < / < 1. In addition to these models, 
we also use the daily-averaged insolation at 65°N on July 21 as 
a proxy for the Milankovitch forcing, /cme- 

Beyond orbital forcings, we also consider the influence of 
variations of the Earth’s orbital inclination and of the cosmic 
ray flux. To do this we build an inclination-based forcing 


model, fincit), using the orbital inclination calculated by Muller 


and MacDonald (19971, and we model the cosmic ray forcing 


as a geomagnetic paleointensity (GPI) time series (standardized 
to the mean and unit variance), fc{t), as collected by|Channell 
[etar] ( |2009] l. 

All forcing models and corresponding prior distributions 
over their parameters (“forcing parameters”) are shown in Ta¬ 
ble 1^ In this table and the following sections, all parameters 
are treated as dimensionless variables by setting the time unit 
to be 1 kyr (ice volume is on a relative scale). For the preces¬ 
sion model, we set / = 0 to treat precession according to the 
Milankovitch theory (although in section we will allow the 
phase of the precession to vary in order to check the sensitivity 
of our results to this assumption). As we do not have any prior 
information about the values of the contribution factors in the 
compound models, we adopt uniform prior distributions over 
the interval [ 0 , 1 ] for these. 

Figure]^ shows the single-component forcing models (which 
do not have any adjustable parameters). All forcing models will 
be included in pacing models and corresponding termination 
models in the following sections. Hereafter, for each forcing 
model, the corresponding pacing and termination models share 
the same name as shown in the first column of Table |2l 


4.2. Pacing models 

As described earlier, we use the term “pacing” to mean that 
some aspect of the climate system is independent of external 
forcings until the climate system reaches a threshold, whereby 
the value of this threshold is dependent upon the forcing. We 
model the pacing effect on ice volume variations using the de¬ 
terministic version of the stochastic model introduced by|Huy-| 
|bers and Wunsch| ( |2005| l. In that model the ice volume at time t 
is 


v(t) — v(t-At) + T](t) and if v(t) > h(t) then terminate, (7) 


where 


hit) ^ ho- afit). 


( 8 ) 


and At is a constant time interval. Thus the ice volume changes 
in discrete steps until it passes a threshold h{t), which is itself 
modulated by a climate forcing /(f) with a contribution factor 
a. The initial ice volume is vq and the background threshold, 
ho, is either a constant or can itself vary with time. We set 
rjit) to be unity while the threshold has not been reached; af¬ 
ter that the glaciation is terminated by setting rjit) to a constant 
negative value such that the ice volume linearly decreases to 
0 within 10 kyr of the threshold having been exceeded]^ After 
this 77 (f) is set to unity, the next cycle starts. The threshold and 
the deglaciation duration are chosen to generate approximately 


100 and 41 kyr glacial cycles (Huybers and Wunsch 20051. If 
the contribution factor a is zero, the ice volume will vary with 
a period modulated by the background threshold, ho. We de¬ 
fine this model as the Periodic model. In general the period 
may vary with time. However, if ho is constant, then the Peri¬ 
odic model has a constant period of value ho + 10 kyr. Because 


^In practice the ice volume can go slightly negative due to the finite value 
of At, but this is of no practical consequence. 
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Table 2; The termination models and corresponding forcing models and parameters. In addition to any forcing model parameters 
listed, the termination models have pacing parameters and the background fraction parameter. The prior distributions of these 
parameters are described in sections 4.1 4.2 and 4.3 respectively. 


Termination 

model 

Description 

Porcing 

model 

Porcing model 
parameters 

Periodic 

100 kyr pure periodic model 

None 

— 

Eccentricity 

Eccentricity 

/E(f) 

— 

Precession 

Precession 

/p(f) 


Tilt 

Tilt or obliquity 

/T(f) 

— 

EP 

Eccentricity plus Precession 

/ep(0 

a, (p 

ET 

Eccentricity plus Tilt 

/et(0 

a 

PT 

Precession plus Tilt 

/pt(0 

a, (p 

EPT 

Eccentricity plus Precession plus Tilt 

/ept(0 

O', A ^ 

CMP 

(Classical) Milankovitch forcing 

/cMF(f) 

— 

Inclination 

Inclination 

/inc(f) 

— 

GPI 

Geomagnetic paleointensity 

fait) 

— 



Time/kyr 


Figure 3: The single-component forcing models. A deglaciation is likely to be triggered by a peak in the forcing. The values of 
eccentricity, precession, obliquity and Milankovitch forcing (CMF) are calculated by |Laskar et al.| (2004|l, the orbital inclination 
relative to the invariable plane is given by . ' ‘ ' “ “ ‘ 


Muller and MacDonald 


(1997 I, and the GPI record is from Channell et al. (2009 i. 


/!() controls the period of ice volume variations, different val¬ 
ues of /lo are required to model the 100 kyr cycles in the late 
Pleistocene and the 41 kyr cycles in the early Pleistocene (see 
Figure]^. We therefore first build pacing models to separately 
predict the deglaciations over the early and late Pleistocene us¬ 
ing the constant background threshold model. We then use a 
varying background threshold (either linear or sigmoidal) to try 
to model the whole Pleistocene. We now describe these models 
in more detail. 


4.2.1. Constant background threshold 

A constant background threshold is appropriate for modeling 
glacial-interglacial cycles without a transition such as the MPT. 
One realization of such a pacing model with the threshold mod¬ 
ulated by a PT forcing model is shown in Figure]^ The ice vol¬ 
ume grows until it passes the forcing-modulated threshold. The 
ice volume then decreases rapidly to zero within the next 10 kyr. 
We see that a deglaciation tends to occur when the insolation is 
near a local maximum. Hence the pacing model (equations [7] 
and|^ can generate ~100kyr saw-tooth cycles which enables a 
forcing mechanism to pace the phase of these cycles. 

The pacing model has three parameters; vq, ho, a. Rather 
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Constant threshold: 
ho=30 if t>-1000, 
ho=90 otherwise 
Linear threshold: 
p=0.03, q=110 

Sigmoid threshold: 

k=110, t=200,to=-1000 

Sigmoid threshold: 

k=110, t=100,to=-1000 

Sigmoid threshold: 

k=110. t= 200, to=-800 


Time/kyr 

Figure 4; Effect of the threshold in the pacing model. Different values of the threshold, /!o(f), are shown: constant (red), linear 
(green), sigmoidal (blue, cyan, black). The legend shows the values of the parameters of the linear and sigmoid background 
thresholds according to equations [9]and[T^respectively. The Periodic model is achieved using a constant threshold over some time 
span. By changing it from - 30 in the early Pleistocene to ho - 90 in the late Pleistocene, we can reproduce an abrupt change in 
the period of ice volume variations from ~41 kyr to ~100kyr. 



Figure 5: A pacing model with threshold h{t) modulated by the PT forcing model with a - 0.5 and (p - Q (equation]^. The 
pacing model parameters are: background threshold ho - 90; initial ice volume vq = 25; contribution factor of forcing a - 25. 
The dashed line denotes the constant threshold, and the grey line represents the threshold modulated by the PT forcing model, i.e. 
h(f) - ho - a /pt(0 a - 0.5, (p - 0). 


than hxing these to some expected values, we assign a proba¬ 
bility distribution to them. This is the prior which appears in 
equation which shows that by averaging the likelihood over 
values drawn from this prior we get the evidence for the model. 

As described above, a periodic pacing model is generated by 
adopting a constant threshold, h{t) - ho and a - 0. When 
forcings are added onto the constant threshold (to give a + 0), 
the ice volume variations then have an average period of about 
(ho -H 10 - a) kyr, because ice volume accumulation tends to ter¬ 


minate at a forcing maxima. For this reason we use different 
prior distributions on a and ho depending on whether we are 
trying to model the early (41 kyr cycles) or late (100 kyr cycles) 
Pleistocene. Specihcally, we use prior distributions for vq , ho, 
and a which are uniform over the following intervals (and zero 
outside): 0 < vq < 907, 907 < h) < 1307, 157 < a < 357, 
where y - 0.4 when we model ~41 kyr cycles and 7=1 when 
we model ~100kyr cycles. The range of vq is just the range of 
the ice volume variation, while the mean values of the prior dis- 
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tributions of ho and a with y = 1 are the htted values obtained 
by Huybers ( 201 l| l. For the periodic model, a is zero and ho has 
a uniform prior distribution over 7Qy < ho < IlOy. In section 
1 ^ we will check how sensitive our results are to this choice of 
priors. 


4.2.2. Linear trend background threshold 

The constant background threshold model is incapable of 
modeling the transition from the 41kyr world to the lOOkyr 
world. If we treat ho as a step function as shown in Figure 
(red lines), the corresponding pacing model predicts an abrupt 
MPT with an extra parameter (the time of the transition). But 
to model the MPT, we will introduce another two versions of 
the pacing model by allowing the background threshold to vary 
with time (linearly and nonlinearly). 


Studies have suggested various mechanisms which may be 


involved in climate change before and after the MPT (Saltzman 

et al. 

1984 

Maasch and Saltzman 1990| |Ghil| |1994 Raymo 

et al. 

1997 

Paillard 1998 Clark et al. 

1999 ITziperman and 

Gildo] 

2003 Ashkenazy and Tziperman 

200411 . F107 suggests 


that a simple model with a threshold modulated by obliquity 
and a linear trend can explain changes in glacial variability over 
the last 2Myr without invoking complex mechanisms. To in¬ 
vestigate this, we replace the threshold constant ho with a linear 
trend in time 

ho ^ pt + q, (9) 

where p and q are the slope and intercept of the trend respec¬ 
tively. To predict the transition from 41 kyr cycles to 100 kyr cy¬ 
cles with reasonable parameter sets, we adopt the following uni¬ 
form prior distributions for the pacing parameters: 0 < vo < 36, 
0 < p < 0.1, 106 < q < 146 and 10 < a < 30. For the Periodic 
model we use a = 0 and a uniform prior for q between 86 and 
126. These ranges are adopted so that the pacing model predicts 
the 41 kyr and 100 kyr cycles with similar period uncertainties 
as produced by the ranges of parameters in the pacing model 
with a constant background threshold (section pl.2.1| l. 

An example of the linear trend is shown with the green line in 
Figure 1^ If the threshold is not modulated by any forcing (i.e. 
a - 0, the Periodic model), then the pacing model generates a 
gradual transition from 50 kyr cycles 2Ma to 110 kyr cycles at 
the present. 


4.2.3. Sigmoid trend background threshold 

To enable a more rapid onset of the MPT, we introduce an¬ 
other version of the pacing model with a nonlinear trend in the 
background threshold, dehned using the sigmoid function as 

ho = 0.6k/( 1 H- e-('-'»)T) -h 0.4k, (10) 


where k is a scaling factor, to denotes the transition time, and 
T represents the time scale of the MPT. The uniform priors of 
the parameters of this version of pacing models are set to be: 
0 < Vo < 36, 90 < k < 130, 10 < r < 500, 10 < a < 30, 
and -700 < to < -1250, as motivated by the range of MPT 
time given by Clark et al. ( 2006| l. For the Periodic model we 
set a = 0 and change the range of k to be 70 < k < 110. The 


reason for choosing these priors is the same as given in section 

WT2\ 

Figure illustrates this model. A late transition time, to, 
moves the trend to the present, and a smaller transition time 
scale, T, generates a more rapid transition. The values of 0.6k 
and 0.4k in the above equation are set in order to rescale the 
trend model such that the ice volume threshold including a sig¬ 
moid trend allows both ~41 kyr and ~ 100 kyr ice volume vari¬ 
ations. 


4.3. Termination models 


Using the same method described in section for the data, 
we identify glacial terminations in the ice volume time series 
generated by the pacing models. The age uncertainty of each 
termination is equal to half of the duration of the termination. 
As with the data, a single termination is represented as a Gaus¬ 
sian probability distribution over time, which is just the term 
P{Tj\0, M) in equation (see section [^. The full set of pre¬ 
dicted terminations forms the time series model which we will 
compare with the data. We use the term “termination model” to 
refer to the combination of a forcing model and a pacing model, 
which together has a number of parameters. These are listed in 
Table m Each of these termination models can have different 
background threshold models, as was explained in section |4~2] 

Figure shows schematically how we compare the model- 
derived terminations (red line) with the data on one termination 
(black line). The event likelihood (the integral in equation]^ for 
a termination is calculated by integrating over time the product 
of the probability distribution of the observed time of the ter¬ 
mination, P{tj\crj, Tj), with the model prediction of the true ter¬ 
mination time, P{Tj\0, M). The product of event likelihoods for 
all terminations in a data set is the likelihood for the termina¬ 
tion model with specihc values of the parameters of the forcing 
and pacing model. By calculating the likelihood for many dif¬ 
ferent values of those parameters (drawn from their prior distri¬ 
butions), and averaging them, we arrive at the evidence for that 
termination model (equation]^. 

To accommodate other contributions from the climate system 
to the timing of a termination, we add a constant background 
probability to the termination model. This is dehned using the 
background fraction b - HhUHt, + Hg), where Hb is the am¬ 
plitude of the background and Hg is the difference between the 
maximum and minimum of the Gaussian sequence. The back¬ 
ground fraction is a parameter of the model which we do not 
measure, so we assign it a prior (uniform from 0 to 0.1) and 
marginalize over this too. 

Let us summarize our modelling procedure. A forcing model 
(Figure modulates the ice volume threshold (equation 
of the pacing model (equation from which the termination 
model (e.g. red line in Figure]^ is derived. This is then com¬ 
pared with a sequence of terminations identihed from a 
data set using our Bayesian procedure. 
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Figure 6: Schematic illustration of the components in the likelihood calculation (equation]^. The red line is the termination model 
generated from the pacing model shown in Figure The black line represents the measured data on termination j. Its time and 
uncertainty are interpreted probabilistically as a Gaussian distribution over time. 


5. Results of the model comparison 


5.1. Evidence and Bayes factor 

We calculate the Bayesian evidence of the termination mod¬ 
els listed in Table |2] for each of the data sets shown in Table [1] 
We calculate this for terminations extending over three different 
time spans; 1 Ma to OMa, 2Ma to 1 Ma and 2Ma to OMa. The 
first time span is the same as that chosen by Huybers P011| l. 
However, other studies claim that the onset of 100 kyr cycles 
occurred around 0.8 Ma. We will examine in section |6] how 
sensitive our results are to the choice of time span. According 
to the time span in question, we need to choose the appropriate 
pacing model, because this determines the dominant period. 

The Bayes factor (BF) is just the ratio of the evidence for two 
models. Rather than reporting Bayes factors for various pairs of 
models, we will report them for all models relative to a simple 
reference termination model. This reference model is just a uni¬ 
form probability distribution over the time of deglaciations, and 
has no parameters. It corresponds to a constant probability in 
time of a deglaciation, but its choice is arbitrary as it just serves 
to put the evidences on a convenient scale. 

Bayes factors should only be used to compare different mod¬ 
els for a common data set. This is because their definition re¬ 
quires that the factor P{D) in equation[^cancels out. 


5.1.1. Late Pleistocene (1-0Ma) 

The deglaciations identified using H07’s method (in the data 
sets HA, HB, HP, LR04, and LRH) contain many minor termi¬ 
nations which may be better explained by models which predict 
~41 kyr cycles. Thus, we choose constant background thresh¬ 
olds with y = 1 and y - 0.4 for all termination models in order 
to predict 100 kyr and 41 kyr variations, respectively, over the 
past 1 Myr. 

The BF for each termination model relative to the uniform 
model is shown in Figure [7] We see that the HA, HB, LR04, 
and LRH data sets favor the models with a tilt component and 
with y - 0.4. Although compound models such as EPT and 


CMF sometimes have BFs slightly higher than the Tilt model, 
precession and eccentricity may not be necessary to explain the 
terminations identified in these data sets. 

The HP data set favors the PT model with y - 1. This could 
be caused by a mismatch between the terminations identified 
in HP and the terminations identified in other data sets. For 
example, around the time of termination 6 (Figure |^, two ter¬ 
minations are identified in HP while only one termination is 
identified in other data sets. The discrepancy between HP and 
other data sets is larger before 0.8 Ma, which indicates a more 
ambiguous definition of terminations, particularly for planktic 
b'^O. On account of this, in section]^ we will narrow the time 
span to 0-0.8 Ma (a more conservative time scale of late Pleis¬ 
tocene). Nevertheless, for all the data sets containing minor 
terminations, tilt is a common factor in the preferred models. 

For the DD, ML, and MS data sets, the PT and CMF models 
with y = 1 are favored. In other words, the major terminations 
are better predicted by a model involving precession and tilt 
rather than either alone, although tilt alone can pace minor ter¬ 
minations. Because the EPT and CMP models have lower BPs 
than the PT model, the eccentricity component is unlikely to 
pace the glacial terminations directly. Yet eccentricity can de¬ 
termine the glacial terminations indirectly through modulating 
the amplitude of the precession maxima (i.e. e sin at). A similar 
conclusion was drawn by |Huybers] ( |2011) using p-values. We 
note that the rejection of a null hypothesis in this way does not 
automatically validate the alternative hypothesis. The Bayesian 
approach allows one to directly compare multiple models in a 
symmetric fashion. 

Since the late Pleistocene is characterized predominantly by 
major terminations, we conclude that late Pleistocene climate 
change is paced by a combination of obliquity and preces¬ 
sion. This does not automatically imply that there is no link 
between major terminations and eccentricity variations. Eccen¬ 
tricity may determine the 100 kyr cycles in the late Pleistocene, 
while obliquity and precession influence the exact timing of the 
terminations (|Lisiecki[ |2010[). This could be studied in future 
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work by introducing an eccentricity dependence into the pacing 
model. 

5.1.2. Early Pleistocene (2-1 Ma) 

Here we only consider the HA, HP, HB, L04, and LRH data 
sets, because the DD, ML, and MS sets have no terminations in 
the early Pleistocene. We only calculate BFs for models with 
y - 0.4 (and not y = 1), because this reproduces periods on 
the order of 41 kyr, and such cycles are obvious in all data sets 
(Figure]^. We exclude the GPI model because the GPI record 
has a time span less than 2Myr. The BFs for the termination 
models are shown in the upper right panel of Figure [7] 

We see that the Tilt model is favored by all data sets. The 
combination of tilt with other orbital elements does not give a 
higher BF, so we conclude that the other orbital elements do 
not play a major role in pacing the deglaciations over the early 
Pleistocene. It is important to realise that although the Bayesian 
evidence generally penalizes more complex models, this does 
not automatically result in a lower Bayes factor for such mod¬ 
els. They can achieve higher Bayes factors if the model is sup¬ 
ported by the data sufficiently strongly (see the references in 
section]^. 

5.1.3. Whole Pleistocene (2-0 Ma) 

For the whole Pleistocene we use the data sets HA, HB, HP, 
LR04, and LRH as well as the hybrid data sets, HADD, HAML, 
and HAMS. We use pacing models with and without a trend 
threshold to model the terminations. The BFs for the above 
models and data sets are shown in the lower left panel of Figure 
0 

For the HA, HB, and LR04 data sets, the Tilt model with 
y = 0.4 is favored. Other combinations with the tilt component 
and with y = 0.4 yield similar BFs. However, for the HP and 
LRH data sets, the PT model with a sigmoid trend is favored 
and this model also gives high BFs for the HA, HB, and LR04 
data sets. For all these data sets, the Precession, Eccentricity, 
and Periodic models have rather low BFs. These results indicate 
a major role for tilt and a minor role for precession in pacing 
major and minor Pleistocene deglaciations. For all the above 
data sets, the CMF model with y = 0.4 has a high BF, but 
not higher than other models with a tilt component. CMF is 
an optimized version of the EPT model. Eaced with different 
models which give similar Bayes factors, we will normally want 
to choose the simplest, which here is PT. We will investigate 
this further in section |6] 

Eor the HADD, HAML, and HAMS data sets, the PT model 
with a threshold modulated by a sigmoid trend is favored, and 
those compound models with a tilt component also have high 
BEs. The whole Pleistocene deglaciations appear to be paced 
by the combination of precession and obliquity. This is con¬ 
sistent with the results for the late-Pleistocene deglaciations. 
The physical reason why precession becomes important after 
the MPT is beyond the scope of our work and is still under de¬ 
bate. 

On account of the existence of the MPT, modeling the whole 
Pleistocene with a constant background threshold model makes 
little sense, so those corresponding results should not be given 


much weight. (This corresponds to assigning all those models 
a smaller model prior, P{M).) More appropriate are the models 
with linear and sigmoid background thresholds. Among these, 
we see that the EPT and CMP models have BEs about ten times 
lower than the PT model. We conclude that eccentricity does 
not play a significant role in pacing terminations over the whole 
Pleistocene. We also find that the PT model with a sigmoid 
background threshold is more favored than the PT model with 
a linear background threshold, which indicates that the MPT 
may not be as gradual as claimed by ( |Huybers]|2007| l. We will 
discuss this further in section|6] 

According to Pigure 0 the Inclination and GPI models are 
not favored, and in fact are less favored than the reference uni¬ 
form model (as BP<1). Thus we find that the geomagnetic pa- 
leointensity does not pace glacial cycles over the last 2 Myr, al¬ 
though we note that there is some controversy over the link be- 


tween the GPI and climate change (Courtillot et al. 

2007||Pier- 

rehumbert 2008 Bard and Delaygue 

|2008 Courtillot et al. 

2 OO 8 I 1 . In contrast to the conclusion of 

IMuller and MacDonald 


( 1997|l, there we find no evidence for a link between the orbital 


inclination and ice volume change. 

5.2. Discrimination power 

To validate our method as an effective inference tool to select 
out the true model, we generate simulated data from each model 
and then evaluate the Bayes factors for all models on these data. 
The data are simulated with the following parameters for all 
models except the Periodic one: ho = IlOy, a = 25y, b - 0, 
and Vo = 45y, where y = 1 for terminations simulated over 
the last 1 Myr and y = 0.4 for the time range 2 to 1 Ma. Eor 
the Periodic model we use instead of Hq - 90y and of course 
a = 0. Recall that the period of the resulting time series is 
approximately /zq H- 10 - a. Other parameters in corresponding 
forcing models are fixed at a = 0.5 for compound models with 
two components, a - 0.3 and [3 - 0.2 for the EPT model, and 
(p -0 for models with a precession component. 

The BEs for simulated data over the last 1 Myr are shown 
in the left panel of Pigure 0 We see that all models based on 
a single orbital element are correctly selected, although those 
models combining the correct single orbital element with other 
elements may also give comparable BEs. When models have 
similar BEs we would generally want to favor the one with 
fewest components. This again corresponds to using a larger 
value of the model prior, P{M) (see sectional. 

Incorrect models, in contrast, generally receive much lower 
Bayes factors. Eor the PT-simulated data set, the PT model 
is correctly discriminated from the CMP model (a fitted EPT 
model). We also see that although the ET model may not be 
correctly selected out when its BP is similar to that obtained for 
EP, PT, EPT, and CMP models, the ratios of the Bayes factors 
are close to unity. The much larger ratios between them for the 
real data validate our inference of the ET model (see section]^. 
Pigure0shows that the EP model is not favored over the Eccen¬ 
tricity model even when the former is the true model. However, 
the Eccentricity model is never favored on any of the real data 
sets, so this misidentification does not occur in practice. In con¬ 
clusion, this discrimination test indicates that our identification 
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Figure 7: The Bayes factors relative to the uniform model for terminations occurring over the past 1 Myr (upper left), from 2Ma to 
1 Ma (upper right), over the past 0.8 Myr (lower left), and over the past 2 Myr (lower right). The logarithm of the Bayes factor is 
shown on a color scale for each model (vertical axis) and data set (horizontal axis). Upper left panel; the models above and below 
the white line have constant background threshold defined by y = 1 and y = 0.4, respectively. Upper right panel; all models have 
a constant background threshold defined by y = 0.4. Lower left panel; Same as the upper left panel but for terminations over the 
past 0.8 Myr. Lower right panel; the upper, middle, and lower blocks (of ten models each, separated by the white line) use a linear 
trend, a sigmoid trend, and a constant background threshold (respectively) with y = 0.4. In all panels except the top right one, the 
data sets on the left side of the dashed line include minor late-Pleistocene terminations while the data sets on the right side do not. 
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Figure 8: Discrimination power. As Figure |^but for simulated data extending over the last 1 Myr (left) and from 2Ma to 1 Ma 
(right). The dashed line indicates the results for the true model for each data set. Ideally this BF would be much higher than all 
other BFs in that column. 


(in section [7. l.l[ l of the PT model as the best model for the late 
Pleistocene is reliable. 

We then apply the same test to the period 2-1 Ma, which uses 
a different value of j as explained above. The results are shown 
in the right panel of Figure We see that the correct model 
always has a larger Bayes factor than the other models. Yet we 
also see that for data simulated from the PT model, the CMF 
and EPT models have similar BFs as the PT model. However, 
as the PT model is not as fine tuned as the CMF model and 
has fewer adjustable parameters than the EPT model, we would 
generally invoke Occam’s razor to select the PT model. 

This experiment confirms that Bayesian model comparison 
and our interpretation of the Bayes factors allows us to se¬ 
lect the correct model. We conclude that tilt (or obliquity) is 
the main “pacemaker” of the deglaciations over the last 2 Myr, 
while precession may pace the deglaciations over the late Pleis¬ 
tocene. This indicates that precession becomes important in 
pacing terminations after the MPT. Other climate forcings, in¬ 
cluding GPI and inclination forcing, are unlikely to pace the 
deglaciations over the Pleistocene. 

6. Sensitivity test 

We now perform a sensitivity test to check how sensitive a 
model’s BP is to the choices of time scale and model priors. 

To do this we first change the time of the onset of the 100 kyr 
cycles from 1 Ma to 0.8 Ma. We recalculate the BPs and show 
them in the lower left panel of Pigure|7] We see that the combi¬ 
nation of obliquity and precession (i.e. the PT model) still paces 
the major terminations (DD, ML, and MS) better than obliquity 
alone. So our conclusion is robust to this change of the late- 
Pleistocene time span. 


We then change the prior distributions over some model pa¬ 
rameters and keep others fixed. We apply this sensitivity test to 
the ML, HA, and HAML data sets with time spans of 1-0 Ma, 
2-1 Ma, and 2-0 Ma, respectively. These three data sets are 
representative and conservative because they contain the major 
terminations as well as minor terminations identified in the HA 
data set, which is stacked from both benthic and planktic data 
sets. In each case we select the most favored types of the pacing 
model according to the model comparison in section They 
are: the constant background threshold with y - I for ML; the 
constant background threshold with y = 0.4 for HA; sigmoid 
background threshold for HAML. Por each model, we change 
the range of the uniform prior on each parameter as follows (the 
name in parentheses is used to label the change in Pigurej^ 


/l = 0^-10</l<10 (lag): Here we account for the 
possible time lag between the forcing and its effect (as was 
suggested by previous studies such as Hays et al.|1976 and 
Imbrie and Imbrie]| 1980)1. A represents the time lag(s) of 


any model listed in Table and A ranges -10 to 10 kyr in 
steps of 1 kyr. Por models with a single component, a time 
lag is achieved by shifting the corresponding time series to 
the past or to the future. Por compound models, each com¬ 
ponent is shifted independently, and the corresponding ev¬ 
idences are calculated by marginalizing the likelihood over 
time lags of all components. 


• 90y < ho < 130y —» 80y < ho < 

I40y (Marge) and lOOy < ho < 120y (hsmall): We extend 
or shrink the upper and lower limits of the background 
threshold, ho, by lOy. Changing the prior distribution of ho 
is equivalent to changing the prior distribution of the pe¬ 
riod of a pacing model, because the average period is about 
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ho + lQ-a (see section [4.2.1| l. The above changes only ap¬ 
ply to models with a while the prior distribution of the 
Periodic model {a — 0) is changed from 70y < Iiq < \ lOy 
first to 60y < /iq < I 2 O 7 and then to SOy < ho < IOO 7 . 
For models with a sigmoid trend, the prior distribution of 
k is changed from 90 < k < 130 first to 80 < A: < 140 and 
then to 100 < k < 120 . 

• ISy < a < 357 —> 5y < a < 457 ifllarge) or 2 O 7 < a < 
3 O 7 (asmall): We extend or shrink the range of contribu¬ 
tion factor of forcing, a, around its mean. These changes 
do not apply to the Periodic model, for which a = 0. 

• 0 < b < Q.l ^ 0 < b < 0.2 (Marge) oxO < b < 0.05 (as- 
mall): We double or halve the upper limit of b, the contri¬ 
bution factor of the background in the termination model. 

• (f>-Q^-7T<(p<7T (phi): We now allow any value for 
the the phase of the precession, <p, which is related to the 
season of the insolation that forces the climate change. 

The BFs for the models with each of the above changes are 
shown Figure [9j separated into three blocks corresponding to 


means that the pacing depends on the overall northern hemi¬ 
sphere summer insolation at a range of northern latitudes (or 
equivalently the duration of the southern summer) rather than 
that at a specific latitude and time in summer. This is consis¬ 
tent with |Huybers|201 1] s conclusion that “climate systems are 
thoroughly interconnected across temporal and spatial scales”. 

We found in section [5. 1.3 [ that the pacing model with a sig¬ 
moid background threshold model was favored when modeling 
the whole Pleistocene. We now identify which parameters of 
that model are most favored by the data. To do this we calcu¬ 
late the marginalized likelihood (relative to the uniform model) 
for the PT model with a sigmoid background threshold as a 
function of both the transition time scale, r, and transition mid¬ 
point, to, on the FIAML data set (i.e. we marginalize over all 
other parameters): see the right panel of Figure [T^ To explore 
this more completely we have extended the upper limit of to 
from -700 kyr to -300 kyr. The peak is at around r = 100 kyr 
(about one glacial-interglacial cycle) and to - -715 kyr. To vi¬ 
sualize this transition, a sigmoid background model with this 
value of T is shown in Figure Defining the transition dura¬ 
tion as the time taken for the ice volume to change from 25% to 
75% of its maximum value, t = 100kyr corresponds to a tran¬ 
sition duration of 220 kyr. This timescale for the MPT is con- 


the different data sets, ML, HA, HAML. Por the ML data set (1- 

sistent with the flndings of Honisch et al. ( 

2009|l; Mudelsee and 

0 Ma; top block), the PT and CMP models are favored over the 

Schulz ( 1997|l; Tziperman and Gildor 

(200311; Martmez-Garcia 


Tilt model for all changes in the priors. The PT and CMF mod¬ 
els without time lags are also favored over corresponding mod¬ 
els with lags. This indicates that the Tilt and Precession models 
pace climate change without significant time lags. Over the 
early Pleistocene (middle block), the Tilt model is marginally 
favored. The BFs of the EPT model vary a lot but are never 
higher than the Tilt model. For the HAML data set (2-0 Myr; 
bottom block), the model combining a sigmoid trend and the 
PT forcing is favored for all changed priors. Moreover, the 
BF for the PT model increases when shrinking the range of 
the background fraction, b. The relative lack of significance 
of the background suggests a significant influence of obliquity 
and precession over the past 2 Myr. 

To further investigate the role of precession in pacing the 
major late-Pleistocene deglaciations, we marginalize the like¬ 
lihoods for the PT model over all its parameters except for 
the contribution factor of precession contribution factor, a, and 
phase, (j). (Note that the evidence is the likelihood marginal¬ 
ized over all model parameters.) We do this for the ML data 
over the last 1 Myr. The distribution of this marginalized likeli¬ 
hood (relative to the uniform model) is shown in the left panel 
of Figure 10 The highest values occur for phases ranging 
from -50° < (p < h- 50°, indicating that the main pacemaker 
under this model is either the intensity of the northern hemi¬ 
sphere summer insolation or the duration of the southern sum¬ 
mer (we cannot distinguish between these based on available 
data). While very small contribution factors, a < 0.1, are 
strongly disfavored, the model is otherwise not very sensitive 
to a. Since a determines the size of the contribution of pre¬ 
cession to the PT model (equation [^, this means that some 
precession contribution is favored, but the exact amount is not 
well constrained. This broad high likelihood range of a and (p 


et al. (201 l| l. It is shorter (more abrupt transition) that found 


by H07 and others ( |Raymo et al.||2004]|Liu and Herbert] |2004t 
|Medina-Elizalde and Lea 2005 |Blunier et al. 1998| l, although 
Eigure 10 shows that longer time scales are not that improba¬ 
ble (but note that the likelihoods are shown on a logarithmic 
scale). The transition time of 715 kyr ago is somewhat later 
than the mid-point of the MPT of —900 kyr identified by |Clark| 
|et al.| ( |2006] l using a frequency spectrogram analysis. Yet our 
data/analysis permits a range of values, although we see that 
the region around -900 kyr is disfavored for low values of t. 
Discrepancies from previous results could also arise from the 
fact that we use just termination data. 

As a final sensitivity test, we change the sign of the contri¬ 
bution factor of forcing, a, to model possible anticorrelations 
between forcing models and the data over the late Pleistocene. 
We And that this significantly reduces the BP for all favored 
models, which shows that models with anticorrelations are a 
poor description of the data. 

7. Summary and conclusions 

Using likelihood-based model comparison, we And that a 
combination of obliquity (axial tilt) and precession is the main 
pacemaker of the 12 major glacial terminations in the late Pleis¬ 
tocene. Obliquity alone can trigger minor terminations over the 
whole Pleistocene. The obliquity and precession pace the Pleis¬ 
tocene terminations without significant time lags, and their pac¬ 
ing roles can be identified with high significance. 

We confirm the dominant role of obliquity in pacing the 
glacial terminations over the early Pleistocene. In contrast to 
the conclusion of H07, we And that a model with obliquity 
alone describes the major and minor Pleistocene deglaciations 
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Figure 9: Sensitivity test. Bayes factors for several models with a change in the range of priors (compared to what was used in 
section |5.1| and Figure [^. These are shown for three different data sets (and time scales) in the three blocks separated by white 
horizontal lines. In each block the logarithm of the Bayes factor is show on a color scale for each model (vertical axis) and change 
in prior (horizontal axis). The first column - labeled ‘none’ - gives the BFs for models with the original priors for reference. Some 
models are not relevant for certain prior changes, so the corresponding slots are empty (white). The three blocks are as follows. 
Top; pacing model with a constant background threshold with y - I for the ML data set (0-1 Ma). Middle; pacing model with 
a constant background threshold with y - 0.4 for the HA data set (1-2 Ma). Bottom; pacing model with a sigmoid background 
threshold for the HAML data set (0-2 Ma). 



Figure 10; The distribution of the logarithm of the marginalized likelihood relative to the uniform model, logjQ(RML), for the PT 
model as a function of two model parameters. The left panel shows the distribution over the precession contribution factor (a) and 
phase (0) for the PT model with y = 1 for the ML data set (the last 1 Myr). The right panel shows the distribution over the transition 
time (fo) and transition time scale (t) for the PT model with a sigmoid background threshold for the HAML data set (the last 2 Myr). 
10^ and 1.6 X 10^ sample points sampled in a 200 x 200 grid were used to construct the left and right distributions respectively. For 
each panel, the most favored region is identified by applying a 25 x 25 grid to the distribution, and is denoted by a cross. Note that 
the scales saturate; likelihoods above or below the limits of the color bar are plotted using the extreme color. 


(together) better than a model which combines obliquity with a 
trend in the background threshold. Thus obliquity is sufficient 
to explain at least the time of minor terminations before and 
after the MPT, without reparameterizing the model as done by 
H07 and |Raymo et al.| ( |1997| l; [Paillard| ( |T998| l; [Ashkenazy alidl 


|Tziperman| ((2004); (Paillard and Parrenin] ((2004); (Clark et al.| 
( (20U6| l. 

We observe that precession becomes important in pacing the 
~100kyr glacial-interglacial cycles after the MPT. Through the 
comparison of models with a linear trend and models with a 
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sigmoid trend in the background threshold, we find that the 
glacial terminations over the whole Pleistocene can be paced 
by a combination of precession, obliquity, and a sigmoid trend 
in the background threshold. Using marginalized likelihoods, 
we find that the MPT has a time scale (the time required for ice 
volume to grow from 25% to 75% of the maximum) of about 
220 kyr and a mid-point at around 715 kyr before the present. 


This is rather late compared with other studies (Clark et al. 


2006|l, although our data/analysis supports a broad range of val¬ 


ues. Note that we do not assume the existence of a strict peri¬ 
odicity in the data, in contrast to some studies based on power 
spectrum analyses. Since there is no significant change in the 
power spectrum of the insolation before and after the MPT, the 
MPT must be caused by a rapid change of response of the cli¬ 
mate to the insolation, rather than by the insolation itself. This 


is consistent with previous studies (Paillard 

1998 

IParrenin and| 

Paillard(|2003||Ashkenazy and Tziperman 

2004 

Clark et al. 

2006f. 


We also find that geomagnetic forcing and forcing by 
changes in the inclination of the Earth’s orbital plane are un¬ 
likely to cause significant climate change over the last 2Myr. 
This weakens the suggestion that the Earth’s orbital inclination 
relative to the invariable plane influences the climate ( |Muller| 
and MacDonald 1997| l. Our results also suggest that the mod¬ 
ulation of cosmic rays or solar activity by the Earth’s mag¬ 
netic field has at best a limited impact on climate change on 
timescales between 10 kyr and 1 Myr, challenging the hypoth¬ 
esis that connects the geomagnetic paleointensity with climate 
change ( [Channel! et al. 2009| ). 

The Bayesian modelling approach is well suited to multiple 
model comparison, because it evaluates all their evidences ex¬ 
plicitly; a model is not selected just because some alternative 
“noise” model is rejected. Uncertainties in the data are also ac¬ 
commodated. Moreover, the approach automatically and con¬ 
sistently takes into account the model complexity, in contrast to 
most other methods (e.g. frequentist hypothesis testing, max¬ 
imum likelihood ratio tests) which will favor more complex 
models unless they are penalized in some ad hoc way. 

Our conclusions are reasonably robust to changes of param¬ 
eters, priors, time scales, and data sets. The main uncertainty 
in our work comes from the identification of glacial termina¬ 
tions over the Pleistocene, although we have used different data 
sets of terminations to reduce this uncertainty. In future work, 
a more sophisticated Bayesian method (e.g. the method intro¬ 
duced by Bailer-Jones||2012| ) could be employed to model the 
full time series of climate proxies. Using this model inference 
approach, we may learn more about the mechanisms involved 
in the climate response to Milankovitch forcings. 
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